import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.graphics.tsaplots import acf as acf_func
"ggplot")
plt.style.use(import arch
import yfinance as yf
import datetime as dt
from statsmodels.tsa.arima.model import ARIMA
We need to model volatility for risk management and trading.
ARCH and GARCH
Conditional Heteroskedasticity is the academic term for volatility clustering. \(\text{ARCH}(1)\) formulation is
\[ \epsilon_t= \sigma_t w_t \] where \(\{w_t\}\) is a white noise process.
\[ \sigma_t^2=\alpha_0 + \alpha_1 \epsilon_{t-1}^2 \] \(\{\epsilon_t\}\) is autoregressive conditional heteroskedastic process with order unity.
A slight modification will show that how \(\text{ARCH}\) models volatility \[\begin{aligned} \operatorname{Var}\left(\epsilon_t\right) & =\mathrm{E}\left[\epsilon_t^2\right]-\left(\mathrm{E}\left[\epsilon_t\right]\right)^2 \\ & =\mathrm{E}\left[\epsilon_t^2\right] \\ & =\mathrm{E}\left[w_t^2\right] \mathrm{E}\left[\alpha_0+\alpha_1 \epsilon_{t-1}^2\right] \\ & =\mathrm{E}\left[\alpha_0+\alpha_1 \epsilon_{t-1}^2\right] \\ & =\alpha_0+\alpha_1 \operatorname{Var}\left(\epsilon_{t-1}\right) \end{aligned}\]Simply put \[ \operatorname{Var}=\alpha_0+\alpha_1 \operatorname{Var}\left(\epsilon_{t-1}\right) \] which is also an \(\text{AR}(1)\).
For \(\text{AR}(p)\) \[ \epsilon_t=w_t \sqrt{\alpha_0+\sum_{i=1}^p \alpha_p \epsilon_{t-i}^2} \]
The counterpart of \(\text{ARMA}\) is Generalised Autoregressive Conditional Heteroskedastic (\(\text{GARCH}\)), usually formulated as \[ \begin{align} \epsilon_t&= \sigma_t w_t\\ \sigma_t^2&=\alpha_0+\sum_{i=1}^q \alpha_i \epsilon_{t-i}^2+\sum_{j=1}^p \beta_j \sigma_{t-j}^2 \end{align} \] We say \(\{\epsilon_t\}\) is a \(\text{GARCH}(p, q)\) process.
GARCH(1, 1) Simulation
\[\begin{aligned} \epsilon_t & =\sigma_t w_t \\ \sigma_t^2 & =\alpha_0+\alpha_1 \epsilon_{t-1}^2+\beta_1 \sigma_{t-1}^2 \end{aligned}\]If \(\alpha_1+\beta_1<1\), the system is stable, otherwise unstable.
Generate \(\text{GARCH}(1, 1)\) data.
def gen_garch11(a0, a1, b1, N):
= np.random.randn(N)
w = np.zeros(N)
epsilon = np.zeros(N)
sigma_sq
for i in range(1, N):
= a0 + a1 * epsilon[i - 1] ** 2 + b1 * sigma_sq[i - 1]
sigma_sq[i] = w[i] * np.sqrt(sigma_sq[i])
epsilon[i] return epsilon
= gen_garch11(a0=0.2, a1=0.3, b1=0.4, N=1000) epsilon
Plot generated data and ACFs.
= acf_func(epsilon, nlags=50)
epsilon_acf = acf_func(epsilon**2, nlags=50) epsilon_acf_sq
= plt.subplots(figsize=(12, 12), nrows=3, ncols=1)
fig, ax 0].plot(epsilon, label="$\epsilon$")
ax[
1].bar(np.arange(len(epsilon_acf)), epsilon_acf, label="$\epsilon$ ACF")
ax[1].axhline(np.std(epsilon_acf[1:]) * 1.96, ls="--", color="b")
ax[1].axhline(-np.std(epsilon_acf[1:]) * 1.96, ls="--", color="b")
ax[
2].bar(np.arange(len(epsilon_acf)), epsilon_acf_sq, label="$\epsilon^2$ ACF")
ax[2].axhline(np.std(epsilon_acf_sq[1:]) * 1.96, ls="--", color="b")
ax[2].axhline(-np.std(epsilon_acf_sq[1:]) * 1.96, ls="--", color="b")
ax[for i in range(3):
ax[i].legend() plt.show()
<>:2: SyntaxWarning:
invalid escape sequence '\e'
<>:4: SyntaxWarning:
invalid escape sequence '\e'
<>:8: SyntaxWarning:
invalid escape sequence '\e'
<>:2: SyntaxWarning:
invalid escape sequence '\e'
<>:4: SyntaxWarning:
invalid escape sequence '\e'
<>:8: SyntaxWarning:
invalid escape sequence '\e'
/tmp/ipykernel_4645/683653153.py:2: SyntaxWarning:
invalid escape sequence '\e'
/tmp/ipykernel_4645/683653153.py:4: SyntaxWarning:
invalid escape sequence '\e'
/tmp/ipykernel_4645/683653153.py:8: SyntaxWarning:
invalid escape sequence '\e'
Note that \(\epsilon\) looks like white noise, in contrast \(\epsilon^2\) shows autocorrelation and volatility clustering.
= arch.arch_model(epsilon, mean="Zero", vol="GARCH", p=1, o=0, q=1)
model = model.fit() res
Iteration: 1, Func. Count: 5, Neg. LLF: 827535.1835085036
Iteration: 2, Func. Count: 10, Neg. LLF: 1151.346750915227
Iteration: 3, Func. Count: 15, Neg. LLF: 1150.5779430693185
Iteration: 4, Func. Count: 21, Neg. LLF: 1175.1702226992597
Iteration: 5, Func. Count: 26, Neg. LLF: 1112.519073761109
Iteration: 6, Func. Count: 30, Neg. LLF: 1112.5190355232094
Iteration: 7, Func. Count: 34, Neg. LLF: 1112.519033642905
Iteration: 8, Func. Count: 37, Neg. LLF: 1112.519033642824
Optimization terminated successfully (Exit mode 0)
Current function value: 1112.519033642905
Iterations: 8
Function evaluations: 37
Gradient evaluations: 8
print(res.summary())
Zero Mean - GARCH Model Results
==============================================================================
Dep. Variable: y R-squared: 0.000
Mean Model: Zero Mean Adj. R-squared: 0.001
Vol Model: GARCH Log-Likelihood: -1112.52
Distribution: Normal AIC: 2231.04
Method: Maximum Likelihood BIC: 2245.76
No. Observations: 1000
Date: Sun, Oct 27 2024 Df Residuals: 1000
Time: 17:40:33 Df Model: 0
Volatility Model
==========================================================================
coef std err t P>|t| 95.0% Conf. Int.
--------------------------------------------------------------------------
omega 0.1455 4.705e-02 3.093 1.984e-03 [5.330e-02, 0.238]
alpha[1] 0.2516 5.495e-02 4.578 4.684e-06 [ 0.144, 0.359]
beta[1] 0.5074 0.112 4.520 6.191e-06 [ 0.287, 0.727]
==========================================================================
Covariance estimator: robust
Compare estimated coefficients with our preset true value.
res.params
omega 0.145524
alpha[1] 0.251595
beta[1] 0.507414
Name: params, dtype: float64
Fitting Nasdaq Data
= yf.download(
nasdaq "^IXIC"],
[=dt.datetime.today() - dt.timedelta(days=1500),
start=dt.datetime.today(),
end=True,
progress="inline",
actions="1d",
interval )
[*********************100%***********************] 1 of 1 completed
"log_ret"] = np.log(nasdaq["Adj Close"]) - np.log(nasdaq["Adj Close"].shift())
nasdaq[= nasdaq.dropna()
nasdaq = acf_func(nasdaq["log_ret"], nlags=50) nasdaq_acf
= plt.subplots(figsize=(12, 12), nrows=3, ncols=1)
fig, ax 0].plot(nasdaq["Adj Close"], label="nasdaq 100")
ax[1].plot(nasdaq["log_ret"], label="Log Return")
ax[
2].bar(np.arange(len(nasdaq_acf)), nasdaq_acf, label="nasdaq Return ACF")
ax[2].axhline(np.std(nasdaq_acf[1:]) * 1.96, ls="--", color="b")
ax[2].axhline(-np.std(nasdaq_acf[1:]) * 1.96, ls="--", color="b")
ax[for i in range(3):
ax[i].legend() plt.show()
Fit the model with \(\text{ARIMA}\).
= pd.DatetimeIndex(nasdaq.index).to_period(
nasdaq.index "D"
# without it, pandas will return warning )
= ARIMA(nasdaq["log_ret"], order=(2, 0, 2))
mod_obj = mod_obj.fit() results
Now we want to see how much can’t \(\text{ARIMA}\) explain away in residual.
= acf_func(results.resid, nlags=50)
resid_acf = acf_func(results.resid**2, nlags=50) resid_sq_acf
= plt.subplots(figsize=(12, 8), nrows=2, ncols=1)
fig, ax 0].bar(np.arange(len(resid_acf)), resid_acf, label="Residual ACF")
ax[0].axhline(np.std(resid_acf[1:]) * 1.96, ls="--", color="b")
ax[0].axhline(-np.std(resid_acf[1:]) * 1.96, ls="--", color="b")
ax[
1].bar(np.arange(len(resid_sq_acf)), resid_sq_acf, label="Residual Squared ACF")
ax[1].axhline(np.std(resid_sq_acf[1:]) * 1.96, ls="--", color="b")
ax[1].axhline(-np.std(resid_sq_acf[1:]) * 1.96, ls="--", color="b")
ax[for i in range(2):
ax[i].legend() plt.show()
Auto correlation presents in the squared residuals, we are able to confirm conditional heteroskedastic behavior is present in the diff log return series of the Nasdaq.
Therefore, we can fit a \(\text{GARCH}\) model. Note, we are fitting the log return, not the residuals of \(\text{ARIMA}\).
= arch.arch_model(
model_garch "log_ret"], mean="Zero", vol="GARCH", p=1, o=0, q=1
nasdaq[
)= model.fit() res
Iteration: 1, Func. Count: 5, Neg. LLF: 827535.1835085036
Iteration: 2, Func. Count: 10, Neg. LLF: 1151.346750915227
Iteration: 3, Func. Count: 15, Neg. LLF: 1150.5779430693185
Iteration: 4, Func. Count: 21, Neg. LLF: 1175.1702226992597
Iteration: 5, Func. Count: 26, Neg. LLF: 1112.519073761109
Iteration: 6, Func. Count: 30, Neg. LLF: 1112.5190355232094
Iteration: 7, Func. Count: 34, Neg. LLF: 1112.519033642905
Iteration: 8, Func. Count: 37, Neg. LLF: 1112.519033642824
Optimization terminated successfully (Exit mode 0)
Current function value: 1112.519033642905
Iterations: 8
Function evaluations: 37
Gradient evaluations: 8
= res.resid**2 resid_garch_sq
= acf_func(resid_garch_sq, nlags=50) resid_garch_sq_acf
= plt.subplots(figsize=(12, 4))
fig, ax len(resid_garch_sq_acf)), resid_garch_sq_acf)
ax.bar(np.arange(1:]) * 1.96, ls="--", color="b")
ax.axhline(np.std(resid_garch_sq_acf[-np.std(resid_garch_sq_acf[1:]) * 1.96, ls="--", color="b")
ax.axhline( plt.show()
The residuals still shows autocorrelation, it could mean that model is misspecified, we will investigate further later.